Search results for "String searching algorithm"
showing 10 items of 15 documents
Secure and Privacy Preserving Pattern Matching in Distributed Cloud-based Data Storage
2019
Given two strings: pattern $p$ of length $m$ and text $t$ of length $n$ . The string matching problem is to find all (or some) occurrences of the pattern $p$ in the text $t$ . We introduce a new simple data structure, called index arrays, and design fast privacy-preserving matching algorithm for string matching. The motivation behind introducing index arrays is determined by the need for pattern matching on distributed cloud-based datasets with semi-trusted cloud providers. It is intended to use encrypted index arrays both to improve performance and protect confidentiality and privacy of user data.
Efficient Algorithms for Sequence Analysis with Entropic Profiles
2017
Entropy, being closely related to repetitiveness and compressibility, is a widely used information-related measure to assess the degree of predictability of a sequence. Entropic profiles are based on information theory principles, and can be used to study the under-/over-representation of subwords, by also providing information about the scale of conserved DNA regions. Here, we focus on the algorithmic aspects related to entropic profiles. In particular, we propose linear time algorithms for their computation that rely on suffix-based data structures, more specifically on the truncated suffix tree (TST) and on the enhanced suffix array (ESA). We performed an extensive experimental campaign …
Effect of Practice, Mapping, Stimulus and Size on String Matching
1987
The same-different discrepancy on a matching task on which the subject had to determine the number of common elements (physically identical and appearing in the same position) between two strings of size 1 to 4 was investigated. Manipulated also were the type of presentation (fixed or varied sets), amount of practice (four blocks), and type of stimulus (letters, words). Reaction times for pure positive responses (all same at each level) were faster than negative responses (all different), confirming the usual discrepancy shown in previous studies. The discrepancy was smaller for well-learned sets (fixed sets) and for words, indicating the development of a comparison process based on global…
"Indexing structures for approximate string matching
2003
In this paper we give the first, to our knowledge, structures and corresponding algorithms for approximate indexing, by considering the Hamming distance, having the following properties. i) Their size is linear times a polylog of the size of the text on average. ii) For each pattern x, the time spent by our algorithms for finding the list occ(x) of all occurrences of a pattern x in the text, up to a certain distance, is proportional on average to |x| + |occ(x)|, under an additional but realistic hypothesis.
Multi-Dimensional Pattern Matching with Dimensional Wildcards: Data Structures and Optimal On-Line Search Algorithms
1997
We introduce a new multidimensional pattern matching problem that is a natural generalization of string matching, a well studied problem1. The motivation for its algorithmic study is mainly theoretical. LetA1:n1,?,1:nd be a text matrix withN=n1?ndentries andB1:m1,?,1:mr be a pattern matrix withM=m1?mrentries, whered?r?1 (the matrix entries are taken from an ordered alphabet ?). We study the problem of checking whether somer-dimensional submatrix ofAis equal toB(i.e., adecisionquery).Acan be preprocessed andBis given on-line. We define a new data structure for preprocessingAand propose CRCW-PRAM algorithms that build it inO(logN) time withN2/nmaxprocessors, wherenmax=max(n1,?,nd), such that …
Periodicity and repetitions in parameterized strings
2008
AbstractOne of the most beautiful and useful notions in the Mathematical Theory of Strings is that of a Period, i.e., an initial piece of a given string that can generate that string by repeating itself at regular intervals. Periods have an elegant mathematical structure and a wealth of applications [F. Mignosi and A. Restivo, Periodicity, Algebraic Combinatorics on Words, in: M. Lothaire (Ed.), Cambridge University Press, Cambridge, pp. 237–274, 2002]. At the hearth of their theory, there are two Periodicity Lemmas: one due to Lyndon and Schutzenberger [The equation aM=bNcP in a free group, Michigan Math. J. 9 (1962) 289–298], referred to as the Weak Version, and the other due to Fine and …
On Table Arrangements, Scrabble Freaks, and Jumbled Pattern Matching
2010
Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of Parikh vector q (a “jumbled string”) in the text s requires to find a substring t of s with p(t) = q. The corresponding decision problem is to verify whether at least one such match exists. So, for example for the alphabet Σ = {a, b, c}, the string s = abaccbabaaa has Parikh vector p(s) = (6,3,2), and the Parikh vector q = (2,1,1) appears once in s in position (1,4). Like its more precise counterpart, the renown Exact String Matching, Jumbled Pattern Matching has ubiquitous applications, e.g., string matching with a dyslectic word processor, table rearrangements, …
On the Power of Non-adaptive Learning Graphs
2012
We introduce a notion of the quantum query complexity of a certificate structure. This is a formalisation of a well-known observation that many quantum query algorithms only require the knowledge of the disposition of possible certificates in the input string, not the precise values therein. Next, we derive a dual formulation of the complexity of a non-adaptive learning graph, and use it to show that non-adaptive learning graphs are tight for all certificate structures. By this, we mean that there exists a function possessing the certificate structure and such that a learning graph gives an optimal quantum query algorithm for it. For a special case of certificate structures generated by cer…
Binary jumbled string matching for highly run-length compressible texts
2012
The Binary Jumbled String Matching problem is defined as: Given a string $s$ over $\{a,b\}$ of length $n$ and a query $(x,y)$, with $x,y$ non-negative integers, decide whether $s$ has a substring $t$ with exactly $x$ $a$'s and $y$ $b$'s. Previous solutions created an index of size O(n) in a pre-processing step, which was then used to answer queries in constant time. The fastest algorithms for construction of this index have running time $O(n^2/\log n)$ [Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010], or $O(n^2/\log^2 n)$ in the word-RAM model [Moosa and Rahman, JDA 2012]. We propose an index constructed directly from the run-length encoding of $s$. The construction time of our index i…
From Nerode's congruence to Suffix Automata with mismatches
2009
AbstractIn this paper we focus on the minimal deterministic finite automaton Sk that recognizes the set of suffixes of a word w up to k errors. As first result we give a characterization of the Nerode’s right-invariant congruence that is associated with Sk. This result generalizes the classical characterization described in [A. Blumer, J. Blumer, D. Haussler, A. Ehrenfeucht, M. Chen, J. Seiferas, The smallest automaton recognizing the subwords of a text, Theoretical Computer Science, 40, 1985, 31–55]. As second result we present an algorithm that makes use of Sk to accept in an efficient way the language of all suffixes of w up to k errors in every window of size r of a text, where r is the…